R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Including Plots

You can also embed plots, for example:

str(gapminder)
## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
##  $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
unique(gapminder$year)
##  [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

The dataset contains information on each country in the sampled year, its continent, life expectancy, population, and GDP per capita.

Let’s plot all the countries in 1952.

theme_set(theme_bw())  # set theme to white background for better visibility

ggplot(subset(gapminder, year == 1952), aes(gdpPercap, lifeExp, size = pop)) +
  geom_point() +
  scale_x_log10() 

  1. Why does it make sense to have a log10 scale on x axis? I study history, not maths. I really don’t know.
  2. Who is the outlier (the richest country in 1952 - far right on x axis)?
gapminder %>%
  filter(year == 1952) %>%
  select(country, gdpPercap) %>%
  arrange(desc(gdpPercap))
## # A tibble: 142 × 2
##    country        gdpPercap
##    <fct>              <dbl>
##  1 Kuwait           108382.
##  2 Switzerland       14734.
##  3 United States     13990.
##  4 Canada            11367.
##  5 New Zealand       10557.
##  6 Norway            10095.
##  7 Australia         10040.
##  8 United Kingdom     9980.
##  9 Bahrain            9867.
## 10 Denmark            9692.
## # … with 132 more rows

Next, you can generate a similar plot for 2007 and compare the differences

ggplot(subset(gapminder, year == 2007), aes(gdpPercap, lifeExp, size = pop)) +
  geom_point() +
  scale_x_log10() 

The black bubbles are a bit hard to read, the comparison would be easier with a bit more visual differentiation.

Tasks:

  1. Differentiate the continents by color, and fix the axis labels and units to be more legible (Hint: the 2.50e+08 is so called “scientific notation”, which you might want to eliminate)
ggplot(subset(gapminder, year == 2007), aes(gdpPercap, lifeExp, size = pop)) +
  geom_point(aes(color = continent)) +
  scale_x_log10()  

  options(scipen=999)
  1. What are the five richest countries in the world in 2007?
gapminder %>%
  filter(year == 2007) %>%
  select(country, gdpPercap) %>%
  arrange(desc(gdpPercap)) %>%
  head(5)
## # A tibble: 5 × 2
##   country       gdpPercap
##   <fct>             <dbl>
## 1 Norway           49357.
## 2 Kuwait           47307.
## 3 Singapore        47143.
## 4 United States    42952.
## 5 Ireland          40676.
  1. Can you add a title to one or both of the animations above that will change in sync with the animation? (Hint: search labeling for transition_states() and transition_time() functions respectively)
anim2 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop)) +
  geom_point() +
  scale_x_log10() + # convert x to log scale
  transition_time(year) +
  ggtitle("Min graf") +
  transition_time(year, range= NULL)
anim2

  1. Can you made the axes’ labels and units more readable? Consider expanding the abreviated lables as well as the scientific notation in the legend and x axis to whole numbers.
anim2 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop)) +
  geom_point() +
  scale_x_log10() + # convert x to log scale
  transition_time(year) +
  ggtitle("Min graf") +
  xlab("GDP, indkomst") +
  ylab("Gennemsnitsalder") +
  transition_time(year, range= NULL)
anim2

  1. Come up with a question you want to answer using the gapminder data and write it down. Then, create a data visualisation that answers the question and explain how your visualization answers the question. (Example: you wish to see what was mean life expectancy across the continents in the year you were born versus your parents’ birth years). [Hint: if you wish to have more data than is in the filtered gapminder, you can load either the gapminder_unfiltered dataset and download more at https://www.gapminder.org/data/ ] What will be the best country to be born in, in the year of 1987?
gapminder2 <- gapminder %>%
  mutate(lifemoneyindex = gdpPercap*lifeExp) %>% 
  select(country,lifemoneyindex, year) %>%
  arrange(desc(lifemoneyindex)) %>% 
  head(5)

head(gapminder2)

ggplot(subset(gapminder2), aes(lifemoneyindex, year)) +
  geom_point(aes(x = year, y = lifemoneyindex)) + 
  ggtitle("Best place to be born") + xlab("Years (1952-2007)") + ylab("lifemoneyindex") + 
  scale_x_log10() 

```